-
Notifications
You must be signed in to change notification settings - Fork 3
Database configuration and migration #42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
6148a0b
to
2250ed3
Compare
2250ed3
to
12d3063
Compare
7d2c0c1
to
6e94ae8
Compare
6e94ae8
to
ed959fb
Compare
wp-includes/sqlite-ast/class-wp-sqlite-information-schema-reconstructor.php
Outdated
Show resolved
Hide resolved
wp-includes/sqlite-ast/class-wp-sqlite-information-schema-reconstructor.php
Outdated
Show resolved
Hide resolved
wp-includes/sqlite-ast/class-wp-sqlite-information-schema-reconstructor.php
Show resolved
Hide resolved
wp-includes/sqlite-ast/class-wp-sqlite-information-schema-reconstructor.php
Outdated
Show resolved
Hide resolved
wp-includes/sqlite-ast/class-wp-sqlite-information-schema-reconstructor.php
Show resolved
Hide resolved
wp-includes/sqlite-ast/class-wp-sqlite-information-schema-reconstructor.php
Show resolved
Hide resolved
wp-includes/sqlite-ast/class-wp-sqlite-information-schema-reconstructor.php
Outdated
Show resolved
Hide resolved
Great start! And also a good idea to use a cascade of data sources for backfilling. There's some more way to go, though — I've commented on a bunch of potential issues |
wp-includes/sqlite-ast/class-wp-sqlite-information-schema-reconstructor.php
Outdated
Show resolved
Hide resolved
wp-includes/sqlite-ast/class-wp-sqlite-information-schema-reconstructor.php
Show resolved
Hide resolved
private function escape_mysql_string_literal( string $literal ): string { | ||
// See: https://www.php.net/manual/en/mysqli.real-escape-string.php | ||
return "'" . addcslashes( $literal, "\0\n\r'\"\Z" ) . "'"; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dmsnell can you poke any holes in this? We assume everything is UTF-8, and if it isn't then GIGO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be great to get more eyes on this. The context is:
- We know what the
DEFAULT 'some-value'
is in SQLite. - We want to convert the
'some-value'
part to MySQL for aDEFAULT 'some-value'
in an exported MySQLCREATE TABLE
statement. - What characters that come in raw from SQLite do we need to escape?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 or we could bale out of the escaping part of the problem:
CREATE TABLE `my_table` (
`id` INT NOT NULL,
`name` VARCHAR(45) DEFAULT (FROM_BASE64('aGV5IPCfpJQ='))
) ENGINE = InnoDB;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JanJakes see also https://github.com/adamziel/zs-sync/blob/trunk/lib/class.zs-sync-mysql-helper.php for escaping inspiration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adamziel Oh, that's an elegant solution! But here's the catch — https://dev.mysql.com/doc/refman/8.0/en/data-type-defaults.html — it only works for MySQL >= 8.0.13 😞
So, I was digging deeper, and it seems that PDO\SQLite always uses UTF-8. SQLite C API's can also work with UTF-16, but those are not used by PDO.
So generally speaking, escaping ASCII control characters in this way should be fine, we only need to make sure we list them all. It actually seems that mysqli.real-escape-string.php lists less of them than MySQL docs, so I'll need to verify that.
The only issue could be SQLite's GIGO policy — the fact that they don't enforce UTF-8 validity. We could strip off all invalid UTF-8 sequences, here's a neat trick:
htmlspecialchars_decode(htmlspecialchars($s, ENT_NOQUOTES | ENT_IGNORE, 'UTF-8'), ENT_NOQUOTES);
But in that case... what other values should we escape as well? Seems like SQLite's GIGO policy applies to the database schema as well, so an invalid UTF-8 sequence in a table or a column name could cause troubles too. Maybe that's beyond the scope here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see anything wrong with embracing GIGO on our end. We just have to make sure we're working with a utf-8 connection for this reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, WordPress requires MySQL >= 8.0. We could be a little bit stricter and require 8.0.13 here if that makes it easier. We'll still have to solve the escaping in a medium term but maybe we don't have to block this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see anything wrong with embracing GIGO on our end. We just have to make sure we're working with a utf-8 connection for this reason.
In PHP, the SQLite connection will always be UTF-8. However, if someone saves invalid UTF-8, and we later fetch it and try to manipulate it (e.g., escaping it), that could probably cause some issues, but that's the SQLite GIGO.
Also, WordPress requires MySQL >= 8.0.
Unfortunately, it's 5.5.5+ and 5 has still around 30% of the usage: https://wordpress.org/about/stats/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, it's 5.5.5+ and 5 has still around 30% of the usage: wordpress.org/about/stats
You're a treasure. Thank you for checking this and not just blindly accepting what I said 🙇
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In PHP, the SQLite connection will always be UTF-8. However, if someone saves invalid UTF-8, and we later fetch it and try to manipulate it (e.g., escaping it), that could probably cause some issues, but that's the SQLite GIGO.
That's our backend and we can't do any better than that. Let's just embrace it. UTF-8 is a fine constraint to have and I don't think WordPress is actually compatible with any other encoding, other than by accident. We could even refuse to execute any non-utf8 SET NAMES
and CREATE TABLE
queries which would cleanly resolve the most difficult encoding issues. Even better, a hypothetical developers who uses latin2 would see the error, get annoyed, create an issue, and teach us about their use-case.
wp-includes/sqlite-ast/class-wp-sqlite-information-schema-reconstructor.php
Outdated
Show resolved
Hide resolved
wp-includes/sqlite-ast/class-wp-sqlite-information-schema-reconstructor.php
Show resolved
Hide resolved
I left some more notes. We're getting close here. Thank you for persevering @JanJakes! This is the last stretch before we can finally enjoy the new parser everywhere ❤️ I know I'm going deep on a piece of code we only need for a one-off migration, but a) it still matters and b) I think we'll reuse it later on for migrating between the database backends. |
abbdd34
to
9bc0ad0
Compare
35199fa
to
9bd0825
Compare
9bd0825
to
838967a
Compare
e972e47
to
c745a66
Compare
@@ -1234,14 +1209,14 @@ public function testColumnWithOnUpdate() { | |||
'name' => '_wp_sqlite__tmp_table_created_at_on_update', | |||
'tbl_name' => '_tmp_table', | |||
'rootpage' => '0', | |||
'sql' => "CREATE TRIGGER \"_wp_sqlite__tmp_table_created_at_on_update\"\n\t\t\tAFTER UPDATE ON \"_tmp_table\"\n\t\t\tFOR EACH ROW\n\t\t\tBEGIN\n\t\t\t UPDATE \"_tmp_table\" SET \"created_at\" = CURRENT_TIMESTAMP WHERE rowid = NEW.rowid;\n\t\t\tEND", | |||
'sql' => "CREATE TRIGGER `_wp_sqlite__tmp_table_created_at_on_update`\n\t\t\t\tAFTER UPDATE ON `_tmp_table`\n\t\t\t\tFOR EACH ROW\n\t\t\t\tBEGIN\n\t\t\t\t UPDATE `_tmp_table` SET `created_at` = CURRENT_TIMESTAMP WHERE rowid = NEW.rowid;\n\t\t\t\tEND", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit – how about heredoc or nowdoc to make it a bit more readable for humans?
@@ -1234,7 +1233,10 @@ private function execute_create_table_statement( WP_Parser_Node $node ): void { | |||
if ( $subnode->has_child_node( 'ifNotExists' ) ) { | |||
$tables_table = $this->information_schema_builder->get_table_name( $table_is_temporary, 'tables' ); | |||
$table_exists = $this->execute_sqlite_query( | |||
"SELECT 1 FROM $tables_table WHERE table_schema = ? AND table_name = ?", | |||
sprintf( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the purposes of this PR it's lovely. Thank you for all the thorough escaping!
Future-wise, we should start a separate discussion about query formatting. Having two formatting systems is not ideal, even if only because the sprintf()
caller may accidentally pull an extra ?
into the query. CC @dmsnell
That being said, I wouldn't worry about that too much here. We have to stop somewhere and that seems like a reasonable boundary.
* wrapper that leaks some of the PDO APIs (returns PDOStatement values, etc.). | ||
* In the future, we may abstract it away from PDO and support SQLite3 as well. | ||
*/ | ||
class WP_SQLite_Connection { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about extends \PDO
as a way of enforcing the interface? Then, we can preface this with if(!class_exists('\PDO')) { class \PDO {} }
to prevent fatal errors on PDO-less installations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also wonder about clarity. Since this is SQLite-specific, in the future we could have a separate WP_MySQL_Over_SQLite_Connection
class that will expect a different SQL dialect and would expose MySQL-specific quote
and quote_identifier
implementation. Nothing actionable here, I'm just thinking out loud.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adamziel My thinking here was making an "internal" wrapper of PDO\SQLite
or SQLite3
(or both) for connection to the SQLite database. The part that will need to expose a unified API (e.g., PDO
-like), is the current WP_SQLite_Driver
, I think.
But I see making this adhere to the PDO API may make be useful as well, especially to avoid reinventing the APIs.
As for the clarity of the APIs, maybe we could use the DSN prefix as PDO does? Something like:
$connection = new WP_PDO('mysql:host=localhost;dbname=wordpress'); // MySQL
$connection = new WP_PDO('sqlite:dbname=wordpress'); // SQLite -- this is the one we're discussing now
$connection = new WP_PDO('mysql-on-sqlite:dbname=wordpress'); // MySQL on SQLite -- this is the new driver
$connection = new WP_PDO('pgsql:host=localhost;dbname=wordpress'); // Postgres
$connection = new WP_PDO('pgsql-on-sqlite:dbname=wordpress'); // Postgres on SQLite
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first, it felt wrong, but the more I think about it the more I like it. It's the most connection interface we can possibly offer. Really nice idea. If I'm to nitpick, maybe I'd do WP_PDO::connect()
instead of new WP_PDO
just to return the correct class instance right away and save on WP_PDO
proxy where each method just calls $this->actual_driver->method()
@@ -1353,7 +1350,7 @@ private function get_column_data_types( WP_Parser_Node $node ): array { | |||
) { | |||
$type = 'mediumtext'; | |||
} else { | |||
throw new \RuntimeException( 'Unknown data type: ' . $token->value ); | |||
throw new \RuntimeException( 'Unknown data type: ' . $token->get_value() ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow-up PR idea: use a SQL-specific exception class.
This is looking really good. It seems like the only blocking thing is string escaping in the |
This PR extracts existing SQLite database configuration to a
WP_SQLite_Configurator
and implementsWP_SQLite_Information_Schema_Reconstructor
with the ability to backfill missing table data in information schema.How it works:
WP_SQLite_Configurator::ensure_database_configured()
), while acquiring anEXCLUSIVE
lock to prevent race conditions (multiple requests triggering the same migration).WP_SQLite_Information_Schema_Reconstructor
.wp_get_db_schema()
._mysql_data_types_cache
, if it exists, otherwise directly from the SQLite schema using SQLite column affinity rules.